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Abstract 

A model of stochastic games where multiple controllers jointly control the evolution of the state 
of a dynamic system but have access to different information about the state and action processes 
is considered. The asymmetry of information among the controllers makes it difficult to compute 
or characterize Nash equilibria. Using common information among the controllers, the game with 
asymmetric information is shown to be equivalent to another game with symmetric information. Further, 
under certain conditions, a Markov state is identified for the equivalent symmetric information game 
and its Markov perfect equilibria are characterized. This characterization provides a backward induction 
algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral 
strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian 
game. The class of Nash equilibria of the original game that can be characterized in this backward manner 
are named common information based Markov perfect equilibria. 

Index Terms 

Stochastic Games, Nash equilibrium, Markov Perfect Equilibrium, Backward Induction 

I. INTRODUCTION 

Stochastic games model situations where multiple players jointly control the evolution of 
the state of a stochastic dynamic system with each player trying to minimize its own costs. 
Stochastic games where all players have perfect state observation are well-studied [Hl-lEll. In 
such games, the symmetry of information among players implies that they all share the same 
uncertainty about the future states and future payoffs. However, a number of games arising in 
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communication systems, queuing systems, economics, and in models of adversarial interactions 
in control and communication systems involve players with different information about the state 
and action processes. Due to the asymmetry of information, the players have different beliefs 
about the current state and different uncertainties about future states and payoffs. As a result, the 
analytical tools for finding Nash equilibria for stochastic games with perfect state observation 
cannot be directly employed for games with asymmetric information. 

In the absence of a general framework for stochastic games with asymmetric information, 
several special models have been studied in the literature. In particular, zero-sum differential 
games with linear dynamics and quadratic payoffs where the two players have different obser- 
vation processes were studied in [0, 0, 10. A zero sum differential game where one player's 
observation at any time includes the other player's observation was considered in [|9]|. A zero- 
sum differential game where one player has a noisy observation of the state while the other 
controller has no observation of the state was considered in ifTOl . Discrete-time non-zero sum 
LQG games with one step delayed sharing of observations were studied in IfTTTl . [fT2ll . A one-step 
delay observation and action sharing game was considered in lfT3l . A two-player finite game in 
which the players do not obtain each other's observations and control actions was considered 
in IfPfll and a necessary and sufficient condition for Nash equilibrium in terms of two coupled 
dynamic programs was presented. 

Obtaining equilibrium solutions for stochastic games when players make independent noisy 
observations of the state and do not share all of their information (or even when they have 
access to the same noisy observation as in |[T5lO has remained a challenging problem for general 
classes of games. Identifying classes of game structures which would lead to tractable solutions 
or feasible solution methods is therefore an important goal in that area. In this paper, we identify 
one such class of nonzero-sum stochastic games, and obtain characterization of a class of Nash 
equilibrium strategies. 

In stochastic games with perfect state observation, a subclass of Nash equilibria - namely 
the Markov perfect equilibria- can be obtained by backward induction. The advantage of this 
technique is that instead of searching for equilibrium in the (large) space of strategies, we only 
need to find Nash equilibrium in a succession of static games of complete information. 

Can a backward inductive decomposition be extended to games of asymmetric information? 
The general answer to this question is negative. However, we show that there is a class of 
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asymmetric information games that are amenable to such a decomposition. The basic concep- 
tual observation underlying our results is the following: the essential impediment to applying 
backward induction in asymmetric information games is the fact that a player's posterior beliefs 
about the system state and about other players' information may depend on the strategies used 
by the players in the past. If the nature of system dynamics and the information structure of 
the game ensures that the players' posterior beliefs are strategy independent, then a backward 
induction argument is feasible. We formalize this conceptual argument in this paper. 

We first use the common information among the controllers to show that the game with 
asymmetric information is equivalent to another game with symmetric information. Further, under 
the assumption of strategy independence of posterior beliefs, we identify a Markov state for the 
equivalent symmetric information game and characterize its Markov perfect equilibria using 
backward induction arguments. This characterization provides a backward induction algorithm 
to find Nash equilibria of the original game with asymmetric information. Each step of this 
algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class 
of Nash equilibria of the original game that can be characterized in this backward manner are 
named common information based Markov perfect equilibria. For notational convenience, we 
consider games with only two controllers. Our results extend to games with n > 2 controllers 
in a straightforward manner. 

Our work is conceptually similar to the work in [16J. The authors in [fT6ll considered a model 
of finite stochastic game with discounted infinite-horizon cost function where each player has a 
privately observed state. Under the assumption that player i's belief about other players' states 
depends only the current state of player i and does not depend on player i's strategy, |[T6l 
presented a recursive algorithm to compute Nash equilibrium. Both our model and our main 
assumptions differ from those in [fT6l . 

A. Notation 

Random variables are denoted by upper case letters; their realizations by the corresponding 
lower case letters. Random vectors are denoted by upper case bold letters and their realizations by 
lower case bold letters. Unless otherwise stated, the state, action and observations are assumed 
to be vector valued. Subscripts are used as time index. X a:fe is a short hand for the vector 
(X„, X a+ i, . . . , X fe ), if a > b, then X a:fe is empty. P(-) is the probability of an event, E(-) is the 
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expectation of a random variable. For a collection of functions g, P 9 (-) and E 9 (-) denote that 
the probability /expectation depends on the choice of functions in g. Similarly, for a probability 
distribution tt, E 7r (-) denotes that the expectation is with respect to the distribution it. The notation 
l{ a =6} denotes 1 if the equality in the subscript is true and otherwise. For a finite set A, A(A) 
is the set of all probability mass functions over A. For two random variables (or random vectors) 
X and Y, P(X = x\Y) denotes the conditional probability of the event {X = x} given Y. This 
is a random variable whose realization depends on the realization of Y. 

When dealing with collections of random variables, we will at times treat the collection as a 
random vector of appropriate dimension. At other times, it will be convenient to think of different 
collections of random variables as sets on which one can define the usual set operations. For 
example consider random vectors A = (Ai, A 2 , A 3 ) and A = (Ai, A 2 ). Then, treating A and A 
as sets would allow us to write A \ A = {A 3 }. 

B. Organization 

The rest of this paper is organized as follows. We present our model of a stochastic game 
with asymmetric information in Section [n] We present several special cases of our model in 
Section [TTTJ We prove our main results in Section [TV] We extend our arguments to consider 
behavioral strategies in Section |V] We examine the importance of our assumptions in Section IVT1 
Finally, we conclude in Section IVIII 



A. The Primitive Random Variables and the Dynamic System 

We consider a collection of finitely- valued, mutually independent random vectors (X^W^, 
W§, . . . , W^_ 1; W{, W], . . . , W^, W?, Wf , . . . , W|) with known probability mass functions. 
These random variables are called the primitive random variables. 

We consider a discrete-time dynamic system with 2 controllers. For any time t, t = 1, 2, . . . , T, 
X t G X t denotes the state of the system at time t, XJ\ E IA\ denotes the control action of 
controller i, i — 1, 2 at time t. The state of the system evolves according to 



II. The Basic Game Gl 



X 



■t+i — 



/ t (X t ,U t \U?,W t °). 



(1) 



There are two observation processes 



: Y. 1 




'|, where 



Yl = h}(X t ,Wl), i 



1,2. 



(2) 
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B. The Data Available to Controllers 

At any time t, the vector 1\ denotes the total data available to controller i at time t. The 
vector 1\ is a subset of the collection of potential observables of the system at time t, that 
is, 1\ C {Yj. t , Yf. t , U\. t -i, Uf. t _ x }. We divide the total data into two components: private 
information P\ and common information C t . Thus, I] = (P\, C t ). As their names suggest, the 
common information is available to both controllers whereas private information is available 
only to one controller. Clearly, this separation of information into private and common part can 
always be done. In some cases, common or private information may even be empty. For example, 
if 1] = If = {Yj;. t , Y 2 . t) Uj.^, U 2 .^}, that is if both controllers have access to all observations 
and actions, then C t = \\ = I 2 and P\ = P 2 = 0. On the other hand, if Ij = Y[. t , for i = 1,2, 
then C t = and PJ = I\. Games where are all information is common to both controllers are 
referred to as symmetric information games. 

We denote the set of possible realizations of PJ as V\ and the set of possible realizations of 
C t as Cp Controller i chooses action XJ\ as a function of the total data (P\, C t ) available to it. 
Specifically, for each controller i, 



where g\, referred to as the control law at time t, can be any function of private and common 
information. The collection g J = (g\, . . . ,g l T ) is called the control strategy of controller % and 
the pair of control strategies for the two controllers (g\g 2 ) is called a strategy profile. For a 
given strategy profile, the overall cost of controller i is given as 



where the expectation on the right hand side of dU) is with respect to the probability measure on 
the state and action processes induced by the choice of strategies g 1 , g 2 on the left hand side of 
©. A strategy profile (g^g 2 ) is called a Nash equilibrium if no controller can lower its total 
expected cost by unilaterally changing its strategy, that is, 



U\ = g$(Pl,C t ), 



(3) 



T 




(4) 



t=i 



./V^/^g 2 ), and JV,g 2 )< ^g 2 ), 



(5) 



for all strategies g 



1 , g 2 . We refer to the above game as game Gl. 
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Remark 1 The system dynamics and the observation model (that is, the functions f t , h], hf in 
CQ) and ©), the statistics of the primitive random variables, the information structure of the 
game and the cost functions are assumed to be common knowledge among the controllers. D 

C. Evolution of Common and Private Information 

Assumption 1 We assume that the common and private information evolve over time as follows: 

1) The common information C t is increasing with time, that is, Ct C Ct+i for all t. Let 
Z t+ i = Ct+i \ Ct be the increment in common information from time t to t + 1. Thus, 
C t+ \ = {C t , Z t+ i}. Further, 

Z m = Ct+i^P^U^U 2 , Y^x, Y 2 +1 ), (6) 

where ( t +i is a fixed transformation. 

2) The private information evolves according to the equation 

Pj +1 = £+i(Pj,l4Yj +1 ) (7) 

where = 1,2, are fixed transformations. 

Equation © states that the increment in common information is a function of the "new" 
variables generated between t and t+1, that is, the actions taken at t and the observations made 
at t + 1, and the "old" variables that were part of private information at time t. Equation © 
implies that the evolution of private information at the two controllers is influenced by different 
observations and actions. 



D. Common Information Based Conditional Beliefs 

A key concept in our analysis is the belief about the state and the private informations 
conditioned on the common information of both controllers. Formally, at any time t, given 
the control laws from time 1 to t — 1, we define the common information based conditional 
belief as follows: 

n t (x t , pi p 2 ) := P^-i.&-i(X t = x t , P] = pi, P 2 = p 2 t \C t ) for all x t , pj, p 2 , (8) 

where we use the superscript g\ . t _ 1 , g\. t _i in the RHS of ([8]) to emphasize that the conditional 
belief depends on the past control laws. Note that TL t (-,-,-) is a \X t x V\ x Vf\ -dimensional 
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random vector whose realization depends on the realization of C f . A realization of H t is denoted 
by 7T t . 

Given control laws gl,g t , we define the following partial functions: 

rl = g}(;C t ) T] = g 2 t {.,C t ) 

These partial functions are functions from the private information of a controller to its control 
action. These are random functions whose realizations depend on the realization of the random 
vector Cf The following lemma describes the evolution of the common information based 
conditional belief using these partial functions. 

Lemma 1 Consider any choice of control laws g\. t , g\. t . Let it t be the realization of the common 
information based conditional belief at time t, let c t be the realization of the common information 
at time t, let jI = gl(-,c t ), i = 1,2, be the corresponding realizations of the partial functions 
at time t, and z t +± be the realization of the increment in common information (see Assumption 
[7]). Then, the realization of the conditional belief at time t + 1 is given as 

7r m = F t (n, 7t\ 7t 2 , z m), (9) 
where F t is a fixed transformation that does not depend on the control strategies. □ 

Proof: See Appendix El ■ 
Lemma Q] states that the evolution of the conditional belief IL t is governed by the partial 
functions of control laws at time t. This lemma relies on Assumption \T\ made earlier about 
the evolution of common and private information. We now introduce the following critical 
assumption that eliminates the dependence of U t on the control laws. 

Assumption 2 (Strategy Independence of Beliefs) Consider any time t, any choice of control 
laws g\ :t _i, g\-.t-v an d any realization of common information c t that has a non-zero probability 
under g\. t _x, gf-t-i- Consider any other choice of control laws g{. t _i, gf-.t-i which also gives a 
non-zero probability to c t . Then, we assume that 

P^-^pQ = x^P, 1 = = Pt 2 |c t ) = pa-itf-ipC* = x t ,Pl = plP 2 t = p 2 t \c t ), 

for all Xi,pJ,p t 2 . 
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Equivalently, the evolution of the common information based conditional belief described in 
Lemma [7J depends only on the increment in common information, that is, © can be written as 

ir t+1 = F t (ir t ,z t+1 ), (10) 

where F t is a fixed transformation that does not depend on the control strategies. 

Remark 2 Assumption [2] is somewhat related to the notion of one-way separation in stochastic 
control, that is, the estimation (of the state in standard stochastic control and of the state and 
private information in Assumption [2]) is independent of the control strategy. □ 

III. Games satisfying Assumptions CD and [2] 

Before proceeding with further analysis, we first describe some instances of Gl where the 
nature of the dynamic system and the private and common information implies that Assumptions 
CD and |2] hold. 

A. One-Step Delayed Information Sharing Pattern 

Consider the instance of Gl where the common information at any time t is given as C t = 
■,t-i,Yl. t _ 1 ,XJ\. t _ 1 ,XJl. t _ 1 } and the private information is given as P£ — Y\. Thus, Z t+ i : — 
Ci+i \ C t = {Yj 1 , , Uj, U| }. This information structure can be interpreted as the case where 
all observations and actions are shared among controllers with one step delay. 

Lemma 2 The game with one-step delayed sharing information pattern described above satisfies 
Assumptions [7J and [2] □ 



Proof: See Appendix 




A special case of the above information structure is the situation where the state X 4 = (Xj, X t 2 ) 
and controller i's observation Y t l = X\. A game with this information structure was considered 
in lfl"3l . It is interesting to note that Assumption [2] is not true if information is shared with delays 
larger than one time step [fTTl . 

'Appendices F-J are included in the Supplementary Material section at the end of the paper. 
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B. Information Sharing with One-Directional-One-Step Delay 

Similar to the one-step delay case, we consider the situation where all observations of controller 
1 are available to controller 2 with no delay while the observations of controller 2 are available 
to controller 1 with one-step delay. All past control actions are available to both controllers. 
That is, in this case, C t = {Y\ :t , Yf :t _ x , U^_ x , Uf :t _J, Z m = {Y^, Y t 2 , U t \ XJ 2 t }, controller 
1 has no private information and the private information of controller 2 is P 2 = Y 2 . 

Lemma 3 The game with one-directional-one-step delayed sharing information pattern de- 
scribed above satisfies Assumptions [7J and [2] □ 

Proof: See Appendix ■ 

C. State Controlled by One Controller with Asymmetric Delay Sharing 

Case A: Consider the special case of Gl where the state dynamics are controlled only by 
controller 1, that is, 

X t+1 = / t (X t ,U t \W t ). 

Assume that the information structure is given as: 

= {Y 1:t , Y 2 :i _ d , U}.^}, P] = 0, P 2 = Yl_ d+Vt . 

That is, controller l's observations are available to controller 2 instantly while controller 2's 
observations are available to controller 1 with a delay of d > 1 time steps. 

Case B: Similar to the above case, consider the situation where the state dynamics are still 
controlled only by controller 1 but the information structure is: 

r* — /V 1 V 2 TT 1 1 P 1 — V 1 P 2 — V 2 

— \ 1 l:i-D 1 l:t-d) u l:t-l/7 r ( ~ I () r f — 1 t-d+l:f 



Lemma 4 The games described in Cases A and B satisfy Assumptions [7] and [2] 
Proof: See Appendix IH1 
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D. An Information Structure with Global and Local States 

Noiseless Observations: We now consider the information structure described in [fT8l . In this 
example, the state X 4 has three components: a global state X® and a local state X\ for each 
controller. The state evolution is given by the following equation: 

X t+1 = f t (X?,Xj],\JlW° t ) (11) 

Note that the dynamics depend on the current global state X® but not on the current local states. 
Each controller has access to the global state process X®. t and its current local state X\. In 
addition, each controller knows the past actions of all controllers. Thus, the common and private 
information in this case are: 

c t = {xi,vit^l,-i}, n = {xi} 

It is straightforward to verify that Assumption Q] holds for this case. 

For a realization {xi. t ,u\. t _ 1 ,ul. t _ 1 } of the common information, the common information 
based belief in this case is given as 

n t (x°,x\x 2 ) = P^-^L-^o = x\x\ = x\Xl = x 2 |4*,uL-i,uL-i) 

= t {x0=x o } F(Xl = x\X 2 t = x^ViX-iX-i) (12) 

It is easy to verify that the above belief depends only on the statistics of W°_ x and is therefore 
independent of control laws. Thus, Assumption [2] also holds for this case. 

Noisy Observations: We can also consider a modification of the above scenario where both 
controllers have a common, noisy observation Y t ° = h t (X^, W}) of the global state. That is, 

C t = {Y&, UL-!, UW, Pj = {XI}, Z m = {Y t ° +1 , U t \ U, 2 }. 



Lemma 5 The game with the information pattern described above satisfies Assumptions [7] and 

m 

Proof: See Appendix IB ■ 
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E. Uncontrolled State Process 

Consider a state process whose evolution does not depend on the control actions, that is, the 
system state evolves as 

X t+1 = f t (X t ,W° t ) (13) 

Further, the common and private information evolve as follows: 

1) C m = {C t ,Z t+1 } and 

Z t+1 = C t+1 (P t \P?,Yi +1 ,Y t 2 +1 ), (14) 

where ( t +i is a fixed transformation. 

2) The private information evolves according to the equation 

pi +l = Q +1 (pi,Yi +1 ) (is) 

where Q +1 ,i = 1,2, are fixed transformations. 
Note that while control actions do not affect the state evolution, they still affect the costs. 

Lemma 6 The game Gl with an uncontrolled state process described above satisfies Assump- 
tions Ul and \2\ □ 

Proof: See Appendix [D ■ 
As an example of this case, consider the information structure where the two controllers share 
their observations about an uncontrolled state process with a delay of d > 1 time steps. In 
this case, the common information is C t = {Y\. t _ d , Yf. t _ d } and the private information is 

pi 

r f ~ 1 t-d+V.f 

F. Symmetric Information Game 

Consider the case when all observations and actions are available to both controllers, that is, 
1] = L 2 = C t = {Y^. t , Yf. t , UJ.^-l, U^. t _ 1 } and there is no private information. The common 
information based belief in this case is 7r t (x t ) = P^t-^i-t-^Xt = xt\y\. t ,yl. t ,u\. t _ 1 ,ul. t _ 1 ). 
itt is the same as the information state in centralized stochastic control, which is known to 
be control strategy independent and which satisfies an update equation of the form required 
in Assumption [2] |fl9l . A related case with perfect state observations is the situation where 
I. 1 = I? = x 1:t . 
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G. Symmetrically Observed Controlled State and Asymmetrically Observed Uncontrolled State 

A combination of the previous two scenarios is the situation when the state X t consists of two 
independent components: a controlled component X" and an uncontrolled component X\. Both 
components are observed through noisy channels. The observations about the controlled state 
as well as the past actions are common to both controllers whereas the information about the 
uncontrolled state satisfies the model of Section Ull-EL The common information based conditional 
belief can then be factored into two independent components each of which satisfies an update 
equation of the form required by Assumption |2] 



Our goal in this section is to show that under Assumptions Q] and [2l a class of equilibria of 
the game Gl can be characterized in a backward inductive manner that resembles the backward 
inductive characterization of Markov perfect equilibria of symmetric information games with 
perfect state observation. However, in order to do so, we have to view our asymmetric information 
game as a symmetric information game by introducing "virtual players" that make decisions 
based on the common information. This section describes this change of perspective and how it 
can be used to characterize a class of Nash equilibria. 

We reconsider the model of game Gl. We assume that controller % is replaced by a virtual 
player i (VP %). The system operates as follows: At time t, the data available to each virtual 
player is the common information C t . The virtual player i selects a function T\ from V\ to IA\ 
according to a decision rule xl> 



Note that under a given decision rule Xt> H ls a random function since C t is a random vector. 
We will use to denote a realization of F\. We will refer to T\ as the prescription selected by 
virtual player i at time t. Once the virtual player has chosen Y\, a control action \J\ = rj(P^) 
is applied to the system. \ % '■= (xl> X % 2-, ■ ■ ■ 5 Xt) * s called the strategy of the virtual player i. The 
total cost of the virtual player i is given as 



IV. Main Results 



n = xi(c t ) 



T 




(16) 



t=i 
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where the expectation on the right hand side of (TT6b is with respect to the probability measure 
on the state and action processes induced by the choice of strategies x\ X 2 on me left hand side 
of ([Tot . We refer to the game among the virtual players as game G2. 

Remark 3 In case there is no private information, the function T l t from V\ to U\ is interpreted 
as simply a value in the set U\. □ 



A. Equivalence with Game Gl 

Theorem 1 Let (g\g 2 ) be a Nash equilibrium of game Gl. Define x % f or i = 1, 2, t = 

1,2, ... , T as 

Xi(c t ):=gi(;c t ), (17) 

for each possible realization c t of common information at time t. Then (x 1 ,^ 2 ) is a Nash 
equilibrium of game G2. Conversely, if (xSx 2 ) is a Nash equilibrium of game G2, then define 
g { fori = 1,2, t = 1,2,..., T as 

gi(.,c t ):=xt(ot), (18) 

for each possible realization c t of common information at time t. Then (g^g 2 ) is a Nash 
equilibrium of game Gl. □ 

Proof: It is clear that using (fTTT) . any controller strategy profile in game Gl can be trans- 
formed to a corresponding virtual player strategy profile in game G2 without altering the behavior 
of the dynamic system and in particular the values of the expected costs. If a virtual player 
can reduce its costs by unilaterally deviating from x\ then such a deviation must also exist 
for the corresponding controller in Gl. Therefore, equilibrium of controllers' strategies implies 
equilibrium of corresponding virtual players' strategies. The converse can be shown using similar 
arguments. ■ 
The game between the virtual players is a symmetric information game since they both make 
their decisions based only on the common information C t . In the next section, we identify a 
Markov state for this symmetric information game and characterize Markov perfect equilibria 
for this game. 



September 18, 2012 DRAFT 

Preliminary Version - September 18, 2012 



14 

B. Markov Perfect Equilibrium of G2 

We want to establish that the common information based conditional beliefs Il t (defined in 
d8])) can serve as a Markov state for the game G2. Firstly, note that because of Assumption [2l 
Ii t depends only on the common information C t and since both the virtual players know the 
common information, the belief U t is common knowledge among them. The following lemma 
shows that il t evolves as a controlled Markov process. 

Lemma 7 From the virtual players' perspective, the process Il t , t = 1,2, ... ,T is a controlled 
Markov process with the virtual players' prescriptions 7^, 7f , i = 1,2, ... ,T as the controlling 
actions, that is, 

p(n m |c t ,7r 1:t ,7 1:t ,7i 2 : t ) = P(n m |7T 1:t , T i t ,7? !t ) = P(n m k t)7t \7 t 2 ) (19) 

Proof: See Appendix |Bj ■ 
Following the development in [|20l . we next show that if one virtual player is using a strategy 
that is measurable with respect to U t , then the other virtual player can select an optimal response 
strategy measurable with respect to U t as well. 

Lemma 8 If virtual player i is using a decision strategy that selects prescriptions only as a 
function of the belief U t , that is, 

t = 1, . . . , T, then virtual player j can also choose its prescriptions only as a function of the 
belief U t without any loss of performance. □ 

Proof: See Appendix ■ 
Lemmas [7J and [8] establish U t as the Markov state for the game G2. We now define a Markov 
perfect equilibrium for game G2. 

Definition 1 A strategy profile (^ 1 ,^ 2 ) is said to be a Markov perfect equilibrium of game 
G2 if (i) at each time t, the strategies select prescriptions only as a function of the common 
information based belief U t and (ii) the strategies form a Nash equilibrium for every sub-game 
of G2 0. 
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Given a Markov perfect equilibrium of G2, we can construct a corresponding Nash equilibrium 
of game Gl using Theorem \T\ We refer to the class of Nash equilibria of Gl that can be 
constructed from the Markov perfect equilibria of G2 as the common information based Markov 
perfect equilibria of Gl. 

Definition 2 A strategy profile (g 1 , g 2 ) of the form XJ\ = gl(P\, IT t ), i = 1, 2, is called a common 
information based Markov perfect equilibrium for game Gl if the corresponding strategies of 
game G2 defined as 

$(*t) := g l t {;TTt), 

form a Markov perfect equilibrium of G2. □ 

The following theorem provides a necessary and sufficient condition for a strategy profile to be 
a Markov perfect equilibrium of G2. 

Theorem 2 Consider a strategy pair (ip 1 ,^ 2 ) such that at each time t, the strategies select 
prescriptions based only on the realization of the common information based belief 7it, that is, 

7 i = Vfe), ** = 1,2 

A necessary and sufficient condition for (ip ,i/) 2 ) to be a Markov perfect equilibrium of G2 is 
that they satisfy the following conditions: 

1) For each possible realization n of define the value function for virtual player 1; 

V^(tt) := minE[c 1 (X t ,ri,(P^),r|,(P|,))|n r = ti,T\ = j\T 2 T = ^ 2 (tt)] (20) 

7 1 

Then, iPt( 7T ) m ust be a minimizing 7 1 in the definition ofV^{n). Similarly, define the value 
function for virtual player 2: 

VS(ti) := minEfc^X^r^P^^^Pl))!^ = n,T] = ^(vr),r| = f] (21) 

Then, ip^i 71 ) mu ^ be a minimizing j 2 in the definition ofVfijr). 

2) For t = T — 1, . . . , 1 and for each possible realization n of Tit, define recursively the value 
functions for virtual player 1; 

V?(tt) := minEfc^X^rKP^.r^P 2 )) + V^{U t+1 )\n t = vr,^ 1 = f,V 2 t = ^(vr)] 

(22) 
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where Ht+i = F t (U t ,Z t+1 ). Then, must be a minimizing 7 1 in the definition of 

Vt(7r). Similarly, define recursively the value functions for virtual player 2: 

if (tt) := minE[ c 2 (x t ,r, 1 (P, 1 ) ) r 2 (p, 2 )) + ^ 2 +1 (n t+1 )|iL, = k,T] = ^(vr),r 2 = f] 

-r 

(23) 

where Tlt+i — F t (U t , Z t+1 ). Then, ^ 2 (vr) must be a minimizing 7 2 in the definition of 

Proof: See Appendix [D] ■ 
Theorem |2] suggest that one could follow a backward inductive procedure to find equilibrium 
strategies for the virtual players. Before describing this backward procedure in detail, we make 
a simple but useful observation. In (T20l)- ((23l) . since the 7* enters the expectation only as 7*(P*), 
it suggests that we may be able to carry out the minimization over 7* by separately minimizing 
over 7*(p 4 ) for all possible p\ This observation leads us to the backward induction procedure 
described in the next section. 

Remark 4 Note that if Assumption [2] were not true, then according to Lemma \T\ H t +i = 
F t (Jlt, , T 2 , Zf+i). In this case, the entire prescription 7* will affect the second term in the 
expectation in (|22l) -(l23l). and we could not hope to carry out the minimization over 7* by 
separately minimizing over 7*(p 4 ) for all possible p*. □ 

C. Backward Induction Algorithm for Finding Equilibrium 

We can now describe a backward inductive procedure to find a Markov perfect equilibrium 
of game G2 using a sequence of one-stage Bayesian games. We proceed as follows: 
Algorithm 1: 

1) At the terminal time T, for each realization n of the common information based belief at 
time T, we define a one-stage Bayesian game SGt(k) where 

a) The probability distribution on (X r , P-, P 2 ^) is ir. 

b) Agenfl i observes P^ and chooses action U % T , i — 1,2. 

2 Agent i can be thought to be the same as controller i. We use a different name here in order to maintain the distinction 
between games Gl and SGt{k)- 
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c) Agent z's cost is c*(X T , U^, U|), i = 1, 2. 
A Bayesian Nash equilibrium of this game is a pair of strategies 7*, i — 1, 2, for the agents 
which map their observation to their action such that for any realization p\ 7 J (p J ) 
is a solution of the minimization problem 

minE-[c i (X T ,u 4 , 7 ^(P J T ))|P^ = p J ], 

where j ^ i and the superscript n denotes that the expectation is with respect to the 
distribution ir. (See ll2~T) . 11221 for a definition of Bayesian Nash equilibrium.) If a Bayesian 
Nash equilibrium 7 1 *, 7 2 * of SGt(tt) exists, denote the corresponding expected equilibrium 
costs as V^(7r), i — 1, 2 and define V , T( 7r ) := 7**> i = 1,2. 
2) At time t < T, for each realization 7r of the common information based belief at time t, 
we define the one-stage Bayesian game SG t (Ti) where 

a) The probability distribution on (X t ,Pj,P 2 ) is ir. 

b) Agent i observes P\ and chooses action U\, i = 1,2. 

c) Agent z's cost is c*(X t , Uj, Uf) + ^(^(tt, Z m )), i = 1, 2. 

Recall that the belief for the next time step is Ht+i = F t (n, Z t+1 ) and Z m is given by 
©. A Bayesian Nash equilibrium of this game is a pair of strategies Y,i = 1,2, for the 
agents which map their observation ~P\ to their action \J\ such that for any realization p\ 
7*(p l ) is a solution of the minimization problem 

m inE-[c l (X,,u\7 J (P^)) + V* +1 (F t (n, Z t+1 ))|P* = p<], 

where j ^ = 1,2, and Z t+1 is the increment in common information generated 

according to ©, © and (QQ) when control actions \J\ = u l and = 7- ? (Pf) are used. 
The expectation is with respect to the distribution ir. If a Bayesian Nash equilibrium 7 1 *, 7 2 * 
of SGt(n) exists, denote the corresponding expected equilibrium costs as V^(7r),z = 1,2 
and define ipi(ir) := 7**, i — 1, 2. 

Theorem 3 The strategies ^ 1 ,i/j 2 defined by the backward induction procedure described in 
Algorithm 1 form a Markov perfect equilibrium of game G2. Consequently, strategies g x ,g 2 
defined as 

gi(-,7T t ) :=Vt(0> 
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i = 1,2, t = 1,2, . . . ,T form a common information based Markov perfect equilibrium of game 
Gl. 

Proof: To prove the result, we just need to observe that the strategies defined by the backward 
induction procedure of Algorithm 1 satisfy the conditions of Theorem [2] and hence form a Markov 
perfect equilibrium of game G2. See Appendix |E] for a more detailed proof. ■ 

D. An Example Illustrating Algorithm 1 

We consider an example of game Gl where the (scalar) state X t and the (scalar) control 
actions U},U 2 take value in the set {0,1}. The state evolves as a controlled Markov chain 
depending on the two control actions according to the state transition probabilities: 

F{X t+1 =0\X t = 0,Ut = U?} = ^, 

v{x t+1 = n\x t = i,ul = u 2 } = 1 -, 

P {X t+1 = 0\X t = 0, Ul ± U 2 } = P {X t+1 = 0\X t = 1, U] ^U 2 } = I (24) 

The initial state is assumed to be equi-probable, i.e., P \X\ = 0} = P {X\ = 1} = 1/2. The 
first controller observes the state perfectly, while the second controller observes the state through 
a binary symmetric channel with probability of error 1/3. Thus, 

X t with probability |, 

1 — X t with probability |. 
The controllers share the observations and actions with a delay of one time step. Thus, the 
common information and private informations at time step t are given as 

C t = Y 2 t _ v Ul t _ v Ul t _ x }, P, 1 = {X t }, P t 2 = {Y t 2 }. 

In the equivalent game with virtual players, the decision of the i th virtual player, Y\, is a function 
that maps y\ := {0, 1} to U\ := {0,1}. 

The common information based belief for this case is the belief on (X t , Y 2 ) given the common 
information xi-j-i, y 2 :t -i, «i : t-i> u l-.t-i> tnat 

TT t (x,y 2 ) = P {X t = x,Y 2 = y 2 \x 1 ,t-i,yl. t _ 1 ,u\. t _ l ,ul, t _ 1 } 
= P {X t = x^t^ul^ul^} (\ {y 2 =x} + \ {y 2^ x ^j . (25) 
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The above equation implies that the distribution ir t is completely specified by x t -i, *4-i> u 2 _ v 



That is, 



7T f 



i u t-ij 



(26) 



(Note that F t -\ is a vector-valued function whose components are given by (1251) for all x, y 2 G 
{0, 1}.) The cost functions c J (x, w 1 , w 2 ) for various values of state and actions are described by 
the following matrices 





%t 


= 






= 1 







1 







1 





1,0 


0,1 





0,0 


1,1 


1 


0,1 


0,0 


1 


0,1 


1,0 



where the rows in each matrix correspond to controller l's actions and the columns correspond 
to controller 2's actions. The first entry in each element of the cost matrix is controller l's cost 
and second entry is controller 2's cost. 

Applying Algorithm 1 : 
We now use Algorithm 1 for a two-stage version of the game described above. 

1) At the terminal time step T = 2, for a realization n of the common information based 
belief at time 2, we define a one stage game SG 2 (^) where 

a) The probability distribution on (Jf 2 , Y 2 ) ls 7r - 

b) Agent 1 observes X 2 and selects an action XJ\\ Agent 2 observes Y 2 2 and selects U 2 . 

c) Agent z's cost is c*(X 2 , £7 2 , C/f), given by the matrices defined above. 

A Bayesian Nash equilibrium of this game is a pair of strategies 7 1 ,7 2 , such that 
• For x = 0, 1, 7 1 (x) is a solution of min u i E 7r [c 1 (X 2 , u l , 7 2 (Y" 2 2 ))|X 2 = x]. 
. For y = 0, 1, 7 2 (y) is a solution of min u 2 E 7r [c 2 (X 2 , 7 1 (X 2 ), u 2 )\Y 2 2 = y]. 

It is easy to verify that 

7 1 (x) := 1, 7 2 (y) := 1 for all x,y e {0,1} 
is a Bayesian Nash equilibrium of SG 2 (tt). The expected equilibrium cost for agent i is 

1) for i = 1, 
for i = 2 



^(vr)=E 7r [c i (X 2 ,l,l)] 



vr(X 2 




(27) 
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where n(X 2 = 1) is the probability that X 2 = 1 under the distribution n. From the above 
Bayesian equilibrium strategies, we define the virtual players's decision rules for time 
T = 2 as = 7*, i = 1,2. 

2) At time i = 1, since there is no common information, the common information based 
belief 7Ti is simply the prior belief on Y 2 ). Since the initial state is equally likely to 
be or 1, 

. 2 . 1 /2 1 \ 

ti(^,2/ ) = g ( 3 1 {2/ 2 =^} + ^feV*} ) 

We define the one-stage Bayesian game SGi(ni) where 

a) The probability distribution on (Xi,Y?) is iti. 

b) Agent 1 observes X\ and selects an action U\; Agent 2 observes Y 2 and selects £7 2 . 

c) Agent i's cost is given by c*(Xi, C/j 2 ) + V^(Fi(Xi, [/?)), where F u defined 
by (|26l) and ([251). gives the common information belief at time 2 as a function of 
Xl, [7-j 1 , f/f , and V^, defined in (|27l) . gives the expected equilibrium cost for time 2 
as a function of the common information belief at time 2. 

For example, if U{ ^ Uf, then ([25]>, © and (EU) imply V£(Fi(Xl, C/j 1 , t^ 2 )) = 3/5. 
Similarly, if U{ = U\, then (|25]>, © and © imply ^(^(O, C/f)) = 3/4 and 
^(^(l,^ 1 ,?/ 2 )) = 1/2. Also, <[27]) implies that V 2 2 is identically 0. 
A Bayesian Nash equilibrium of this game is a pair of strategies 5 1 , 5 2 such that 

• For x = 0, 1, 5 1 (x) is a solution of 

mmW"[c l (X 1 ,u 1 ,5 2 (Y 2 )) + V 2 1 (F 1 (X h u l ,5 2 (Y 2 )))\X 1 = x\. 

u 1 

• For y = 0,1, 5 2 (y) is a solution of 

mmE^[c 2 (X 1 ,6\X 1 ),u 2 ) + V 2 2 (F 1 (X 1 ,6\X 1 ),u 2 ))\Y 2 = y}. 

u 1 

It is easy to verify that 

6\x) = l-x, 5 2 (y) = l-y 
is a Bayesian Nash equilibrium of SGi(n). The expected equilibrium costs are 

Vffa) = E[c\X 1 ,5 1 (X 1 ),S 2 (Y 2 ))}, 



September 18, 2012 DRAFT 

Preliminary Version - September 18, 2012 



21 



which gives Vl(ixi) = 47/60 and V^ 2 (7Ti) = 1/3. From the above Bayesian equilibrium 
strategies, we define the virtual players's decision rules for time t = 1 as ipH^i) = 5\ 
i = 1,2. 

Since we now know the equilibrium decision rules i = l,2,t = 1,2 for the virtual 
players, we can construct the corresponding control laws for the controllers using Theo- 
rem |3j Thus, a common information based Markov perfect equilibrium for the game in 
this example is given by the strategies: 



The results of Theorems [2] and [3] provide sufficient conditions for a pair of strategies to 
be an equilibrium of game G2. Neither of these results addresses the question of existence 
of equilibrium. In particular, the result of Theorem [3] states that the (pure strategy) Bayesian 
Nash equilibria of the one-stage Bayesian games SG t (n),t = T, . . . , 1, may be used to find a 
Markov perfect equilibrium of game G2 and hence a common information based Markov perfect 
equilibrium of Gl. However, the games SGt(ir) may not have any (pure strategy) Bayesian Nash 
equilibrium. 

As is common in finite games, we need to allow for behavioral strategies in order to ensure 
the existence of equilibria. Toward that end, we now reconsider the model of game Gl. At each 
time t, each controller is now allowed to select a probability distribution DJ over the (finite) set 
of actions Uf, i = 1,2 according to a control law of the form: 



The rest of the model is the same as in Section HH We denote the set of probability distributions 
overZ^ by A(W t ). 

Following exactly the same arguments as in Section [TV] we can define an equivalent game 
where virtual players select prescriptions that are functions from the set of private information 




and 



g 2 (x2,ir 2 ) = 1 



gKvl^i) = 1. 



V. Behavioral Strategies and Existence of Equilibrium 



Di = ^(Pj,Q). 



(28) 
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V\ to the set A(W f *) and establish the result of Theorem Q] for this case. A sufficient condition for 
Markov perfect equilibrium of this game is given by Theorem [2] where 7* are now interpreted 
as mappings from V\ to A(Z^) (instead of mappings from V\ to UJ:). Given a Markov perfect 
equilibrium (tp 1 , ip 2 ) of the virtual players' game, the equivalent strategies g l t (-, tt) := ip l t (n) form 
a common information based Markov perfect equilibrium of game Gl in behavioral strategies. 

Further, we can follow a backward induction procedure identical to the one used in section HV-C I 
(Algorithm 1), but now consider mixed strategy Bayesian Nash equilibria of the one-stage 
Bayesian games SG t (n) constructed there. We proceed as follows: 
Algorithm 2: 

1) At the terminal time T, for each realization ir of the common information based belief at 
time T, consider the one-stage Bayesian game SGt(tt) defined in Algorithm 1. A mixed 
strategy 7* for the game SG t (ti) is a mapping form V % T to A(U^). A mixed strategy 
Bayesian Nash equilibrium of this game is a pair of strategies 7 X ,7 2 such that for any 
realization p\ 7*(p l ) assigns zero probability to any action that is not a solution of the 
minimization problem 

minE[c*(X 4 ,u\U^)|P^ = p*], 

where is distributed according to 7 J (Pf). Since SG t (^) is a finite Bayesian game, a 
mixed strategy equilibrium is guaranteed to exist [|22l . For any mixed strategy Bayesian 
Nash equilibrium 7 1 *,7 2 * of SGt(^), denote the expected equilibrium costs as V^(tt) and 
define $(tt) := 7", i = 1,2. 

2) At time t < T, for each realization n of the common information based belief at time t, 
consider the one-stage Bayesian game SGt(it) defined in Algorithm 1. A mixed strategy 
Bayesian Nash equilibrium of this game is a pair of strategies 7 X ,7 2 such that for any 
realization p\ 7*(p l ) assigns zero probability to any action that is not a solution of the 
minimization problem 

minE^X^iAU^)) + V^F^tt, Z t+1 ))|P* = p*], 

where is distributed according to 7 J (Pt) and Z t+ i is the increment in common in- 
formation generated according to ©, © and (Q~|) when control actions XJ\ = u' and 
distributed according to 7 J (Pt) are used. Since SG t (7r) is a finite Bayesian game, a mixed 
strategy equilibrium is guaranteed to exist |[22l|. For any mixed strategy Bayesian Nash 
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equilibrium 7 1 *, 7 2 * of SG t (ir), denote the expected equilibrium costs as V t l (n) and define 

We can now state the following theorem. 

Theorem 4 For the finite game Gl, a common information based Markov perfect equilibrium in 
behavioral strategies always exists. Further, this equilibrium can be found by first constructing 
strategies %p l ,%p 2 according to the backward inductive procedure of Algorithm 2 and then defining 
behavioral strategies g\g 2 in Gl as 

g l t (;7T t ) :=Vt(7Tt), 

i = 1,2, t = 1,2,..., T. □ 

VI. Discussion 

A. Importance of Assumption \2\ 

The most restrictive assumption in our analysis of game Gl is Assumption |2] which states 
that the common information based belief is independent of control strategies. It is instructive to 
consider why our analysis does not work in the absence of this assumption. Let us consider the 
model of Section UU with Assumption Q] as before but without Assumption [2j Lemma [Q which 
follows from Assumption [T] is still true. For this version of game Gl without Assumption [2l we 
can construct an equivalent game with virtual players similar to game G2. Further, it is easy to 
show that Theorem \T\ which relates equilibria of G2 to those of Gl is still true. 

The key result for our analysis of game G2 in section [V] was Lemma [8] which allowed us to 
use 7i t as a Markov state and to define and characterize Markov perfect equilibria for the game 
G2. Lemma [8] essentially states that the set of Markov decision strategy pairs (that is, strategies 
that select prescriptions as a function of n t ) is closed with respect to the best response mapping. 
In other words, if we start with any pair of Markov strategies (ip 1 , ip 2 ) for the virtual players and 
define x l t0 be the best response of virtual player i to ipi , then, for at least one choice of best 
response strategies, the pair (x\ X 2 ) belongs to the set of Markov strategy pairs. This is true 
not just for strategies (ip 1 , ip 2 ) that form an equilibrium but for any choice of Markov strategies. 
We will now argue that this is not necessarily true without Assumption [2l 

Recall that due to Lemma [Q the belief ir t evolves as 

7T t = Ft-ifa-i, 7t-i, Tt-D z *)- 
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Thus, in order to evaluate the current realization of n t , a virtual player must know the prescrip- 
tions used by both virtual players. However, the virtual players do not observe each other's past 
prescriptions since the only data they have available is c t . Thus, a virtual player cannot evaluate 
the belief ir t without knowing (or assuming) how the other player selects its prescriptions. 

Consider now decision strategies (ip 1 , tp 2 ) for the two virtual players which operate as follows: 
At each time t, the prescriptions chosen by virtual players are 

7t = Vfe) (29) 

and the belief at the next time t + 1 is 

TTf+i = F t (7r t , ^l(n t ), ipl (7r t ), z m ). (30) 

Assume that the above strategies are not a Nash equilibrium for the virtual players' game. 
Therefore, one virtual player, say virtual player 2, can benefit by deviating from its strategy. 
Given that virtual player 1 continues to operate according to (|29l) and (1301) . is it possible for 
virtual player 2 to reduce its cost by using a non-Markov strategy, that is, a strategy that selects 
prescriptions based on more data than just 7r t ? Consider any time t, if virtual player 2 has 
deviated to some other choice of Markov decision rules ^lit-i m me P ast > men me true belief 
on state and private information given the common information, 

is different from the belief n t evaluated by the first player according to (l30l) . (Note that since 
past prescriptions are not observed and virtual player l's operation is fixed by (1291 and (T30l) . 
virtual player 1 continues to use ix t evolving according to ([30b as its belief.) Even though n t is 
no longer the true belief, virtual player 2 can still track its evolution using (l30l) . Using arguments 
similar to those in the proofs of Lemmas [7] and [H it can be established that an optimal strategy 
for virtual player 2, given that virtual player 1 operates according to (|2~9l) and (l30l) . is of the 
form 7 t 2 = ipf* (lit , 7r t ) , where is the true conditional belief on state and private information 
given the common information whereas n t is given by (1301) . Thus, the best response of player 
2 may not necessarily be a Markov strategy and hence Lemma [8] may no longer hold. Without 
Lemma [H we cannot define Markov perfect equilibrium of game G2 using n t as the state. 
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B. The Case of Team Problems 

The game Gl is referred to as a team problem if the two controllers have the same cost 
functions, that is, c l (-) = c 2 (-) = c team (-). Nash equilibrium strategies can then be interpreted 
as person-by-person optimal strategies |l23l| . Clearly, the results of sections [IV] and [V] apply to 
person-by-person optimal strategies for team problems as well. 

For team problems, our results can be strengthened in two ways. Firstly, we can find globally 
optimal strategies for the controllers in the team using the virtual player approach and secondly, 
we no longer need to make Assumption [2] Let us retrace our steps in section [IV] for the team 
problem without Assumption [2] 

1) We can once again introduce virtual players that observe the common information and 
select prescriptions for the controllers. The two virtual players have the same cost function. 
So game G2 is now a team problem and we will refer to it as T2 . It is straightforward 
to establish that globally optimal strategies for virtual player can be translated to globally 
optimal strategies for the controllers in the team in a manner identical to Theorem [T] 

2) Since we are no longer making Assumption [2l the common information belief evolves 
according to 

n = Fi_ 1 (7Ti_ 1 ,7 t 1 _ 1 ,7 i 2 _ 1 ,z 4 ). (31) 

Virtual player 1 does not observe 7f_ x , so it cannot carry out the update described in ([3TT) . 
However, we will now increase the information available to virtual players and assume 
that each virtual player can indeed observe all past prescriptions y{ : t-nli;t-v We refer 
to this team with expanded information for the virtual players as T2'. 
It should be noted that the globally optimal expected cost for T2' can be no larger than 
the globally optimal cost of T2 since we have only added information in going from T2 
to T2'. We will later show that the globally optimal strategies we find for T2' can be 
translated to equivalent strategies for T2 with the same expected cost. 

3) For T2\ since all past prescriptions are observed, both virtual players can evaluate % t 
using (l3"TT) without knowing the past decision rules V , i:t-ij i>ivt-i- We can now repeat the 
arguments in the proof of Lemma [7] to show that an analogous result is true for team T2' 
as well. The team problem for the virtual players is now a Markov decision problem with 
Ti t evolving according to (|3D) as the Markov state and the prescription pair (7 < L ,7 t 2 ) as the 
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decision. We can then write a dynamic program for this Markov decision problem. 

Theorem 5 For the team problem T2 ' with virtual players, for each realization of ir t , the 
optimal prescriptions are the minimizers in the following dynamic program: 

V? am (7r) := min E[c team (X,, I* (P*,), T 2 T (P 2 T ))\U T = vr, I* = f, T 2 T = f] (32) 

Vf := mm E[c team (X t , I^P, 1 ), T 2 (P t 2 )) + V£r(n m )|IT t = vr, T] = f, T 2 t = f] 

(33) 

where U t+1 = F t {U t , T], T 2 , Z t+1 ). 

4) Let ipj*(7r) be the minimizer in the right hand side of the definition of V^ eam (n) in the 
above dynamic program. The globally optimal virtual players' operation can be described 
as: At each t, evaluate 

7Tt = Ft-i(TTt-i, 7t_i, 7t_i, z t) (34) 
and then select the prescriptions 

T^^'W i = l,2. (35) 

Now, instead of operating according to (|34|) and d35l) . assume that virtual players operate 
as follows: At each t, evaluate 

7Tf = F t _ 1 (7r t _i,Vr-i(7Tt-i) ) '0t-i( 7r t-i)> z t) (36) 
and then select the prescriptions 

7t ! = CW i = 1,2. (37) 

It should be clear that virtual players operating according to (|36l ) and (1371) will achieve the 
same globally optimal performance as the virtual players operating according to (|34l ) and 
(1331) . Furthermore, the virtual players in T2 can follow (1361) and (1371) and thus achieve the 
same globally optimal performance as in T2'. 
Thus, to find globally optimal strategies for the team of virtual players in absence of As- 
sumption [2l we first increased their information to include past prescriptions and then mapped 
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the globally optimal strategies with increased information to equivalent strategies with original 
information. 

For the game G2 in absence of assumption [2l we cannot follow the above approach of first 
increasing virtual players' information to include past prescriptions, finding equilibrium with 
added information and then mapping the equilibrium strategies to equivalent strategies with 
original information. To see the reason, let us denote the virtual player operation given by (|34|) 
and (l35l) by the strategy a\i = 1,2 and the virtual player operation given by d36l) and (1371) by the 
strategy a\i = 1,2. Then, while it is true that J" 1 (a 1 , a 2 ) = J % {a x , a 2 ), i = 1,2, but for some 
other strategies p\p 2 , it is not necessarily true that J % (a\p>) = J l (a % , p>),i, j = 1,2, z ^ j. 
Therefore, the equilibrium conditions for a 1 , a 2 : 

J\a\a 2 )<j\p\a 2 ), and J 2 {a\ a 2 ) < J 2 {a\ p 2 ), (38) 

do not necessarily imply the equilibrium conditions for a 1 , a 2 : 

J\a\a 2 )<j\p\a 2 ), and J 2 {a\ a 2 ) < J 2 {a\ p 2 ). (39) 



Remark 5 Our dynamic program for the team problem is similar to the dynamic program for 
teams obtained in E4ll using a slightly different but conceptually similar approach. D 

VII. Concluding Remarks 

We considered the problem of finding Nash equilibria of a general model of stochastic 
games with asymmetric information. Our analysis relied on the nature of common and private 
information among the controllers. Crucially, we assumed that the common information among 
controllers is increasing with time and that a common information based belief on the system 
state and private information is independent of control strategies. Under these assumptions, the 
game with asymmetric information is shown to be equivalent to another game with symmetric 
information for which we obtained a characterization of Markov perfect equilibria. This charac- 
terization allowed us to provide a backward induction algorithm to find Nash equilibria of the 
original game. Each step of this algorithm involves finding Bayesian Nash equilibria of a one- 
stage Bayesian game. The class of Nash equilibria of the original game that can be characterized 
in this backward manner are named common information based Markov perfect equilibria. 
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The class of common information based Markov perfect equilibria for asymmetric information 
games bears conceptual similarities with Markov perfect equilibria of symmetric information 
games with perfect state observation. In symmetric information games with perfect state obser- 
vation, a controller may be using past state information only because the other controller is using 
that information. Therefore, if one controller restricts to Markov strategies, the other controller 
can do the same. This observation provides the justification for focusing only on Markov perfect 
equilibria for such games. Our results show that a similar observation can be made in our model 
of games with asymmetric information. A controller may be using the entire common information 
only because other controller is using that information. If one controller chooses to only use the 
common information based belief on the state and private information, the other controller can 
do the same. Thus, it is reasonable to focus on the class of common information based Markov 
perfect equilibria for our model of games with asymmetric information. 

Further, for zero-sum games, the uniqueness of the value of the game implies that the equi- 
librium cost of a common information based Markov perfect equilibrium is the same as the 
equilibrium cost of any other Nash equilibrium ||2TT|. 

For finite games, it is always possible to find pure strategy Nash equilibria (if they exist) by 
a brute force search of the set of possible strategy profiles. The number of strategy choices for 
controller i are |W{|l :P i xCl x ... x \U^ v t xC t\_ p or simplicity, assume that the set of possible 
realizations of private information V\ does not change with time. However, because the common 
information is required to be increasing with time (see Assumption [B, the cardinality of the set 
possible realization of common information C t is exponentially increasing with time. Thus, the 
number of possible control strategies exhibits a double exponential growth with time. 

Algorithm 1 provides an alternative way for finding an equilibrium by solving a succession 
of one stage Bayesian games. But how many such games need to solved? At each time t, we 
need to solve a Bayesian game for each possible realization of the belief n t . Let !Z t denote the 
set of possible realizations of the belief n t . Since the belief is simply a function of the common 
information, we must have that \!Z t \ < \C t \. Thus, the total number of one stage games that need 
to solved is no larger that £)t=i \Ct\- Recalling the exponential growth of \C t \, the number of 
one-stage games to solve shows an exponential growth with time. This is clearly better than the 
double exponential growth for the brute force search. 

Two possible reasons may further reduce the complexity of Algorithm 1. Firstly, the set \lZ t \ 
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may not be growing exponentially with time (as in the case of the information structure in 
Section HV-D[ where \1Z t \ = 3, for all t > 1). Secondly, the one-stage games at time t, SG t (n) 
may possess enough structure that it is possible to find an equilibrium for a generic ir that can be 
used to construct equilibrium for all choices of n. For finite games, it is not clear what additional 
features need to be present in game Gl such that the resulting one-stage games SG t (ir) can be 
solved for a generic ir. In the sequel to this paper we will extend the approach used here to linear 
quadratic Gaussian games and show that in these games it is possible to solve the one-stage 
games for a generic belief n. 

Conceptually, the approach adopted in this paper can be extended to infinite time horizon 
games with discounted costs under suitable stationarity conditions. However, in infinite horizon 
games, the number of possible realizations of the common information based belief would, in 
general, be infinite. Establishing the existence of common information based Markov perfect 
equilibria for infinite horizon games would be an interesting direction for future work in this 
area. 
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Consider a realization c t of the common information C t at time t. Let jl,Jt be the corre- 
sponding realization of the partial functions of the control laws at time t, that is, Y t = g l t (-, c t ). 
Given the realization of the common information based belief ir t and the partial functions 7^, 7 2 , 
we can find the joint conditional distribution on (X*, P], P 2 , X t+1 , Pj +1 , P 2 + i, Z t+ i) conditioned 
on the common information at time t as follows: 



VIII. Acknowledgments 



Appendix A 



Proof of Lemma Q] 




September 18, 2012 



DRAFT 



Preliminary Version - September 18, 2012 



30 




(40) 



Note that in addition to the arguments on the left side of conditioning in ([401) . we only need 
7Tt and jl,Jt to evaluate the right hand side of (|4Q|) . That is, the joint conditional distribution 
on (X t , ~P], Pf, X t+ i, Pj +1 , P? + i, Z t+ i) depends only on 7T*, 7^ and 7^ with no dependence on 
control strategies. 

We can now consider the common information based belief at time t + 1, 



The numerator and denominator of (|41"I) are both marginals of the probability in (l40l) . Using (|4Q|) 
in (141) . gives 7r t+ i as a function of 7r t , 7^, 7^ , z t+ i. 



Consider a realization q of common information at time t and realizations ni : t, 7 1:t , 7i :i of 
beliefs and prescriptions till time £. Because of (flOl) in Assumption [2l we have 




(41) 



Appendix B 



Proof of Lemma [7] 



n t+1 = F t (n t , Z t+1 ) 



Hence, in order to establish the lemma, it suffices to show that 



P(Z t+1 |c t ,7r 1;t ,7 1 1 :t ,7 1 2 :i ) = P(Z m |7T t ,7 t \7 t 2 ) 



(42) 



Recall that 



Z 



't+i — Ct+i(P* j P* j Uj, Uf , Y^ +1 , Y^ +1 ) 
= Cm(P 4 \ P?, 7^), 7 2 (P 2 ), Y^, Yl +l ) 



(43) 
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where we used the fact that the control actions are simply the prescriptions evaluated at the 
private information. Therefore, 

P(Z m = z|ct,7r 1:t ,7^,7y 

Yl P ( Z m = z > *t, X t+ i, yj +1 , Yt +1 , p], p 2 t \c t , 7Ti: t , 7i:t, lit) 

x t ,x t+ 1 ,y * +1 ,y t 2 +1 ,Pt ,p| 

1 {C t+ i(P t 1 ,P?,7 t 1 (P t 1 ),7 t 2 (P?),y t 1 + i,y t 2 +1 )=^} P (yt+i' y?+il x *+i) 

xt,yt + i,yf + i,Pt,p? 

X P(x t+1 |x t , 7t(Pt), 7'(Pt 2 ))P(x t , pj, P 2 |Q, 7Tl:t, 7l : t, lit) 

1 {Ct +1 (p t 1 > pf > 7 t 1 (p|).7 t 2 (p?),yt 1 + i-y? +1 )=^} P (yt+i' y?+il x *+i) 

xt,yj +1 ,y? +1 ,pj,p? 

x P(x, +1 |x, ! 7i 1 (Pi 1 ),7'(p 2 ))^(x t! pJ ! p2), (44) 

where we used the fact that P(x t , pj, pf|c*, 7Ti :t , 7^, 7? :t ) = P(x t , pj, p? |c f ), since 71^,7^,7^ 
are all functions of c t , and the fact that P(x t , pj, p^ |c t ) =: 7r t (xt, p], p|). The right hand side in 
(1441) depends only on 7r t and 7^,7^ . Thus, the conditional probability of Zt+i — z conditioned 
on c t , iti-t, i\. t i il t depends only on n t and 7^,7^. This establishes (|42l) and hence the lemma. 

Appendix C 
Proof of Lemma [8] 

Assume that virtual player 1 is using a fixed strategy of the form Tj = ^(II t ), t = 1, 2, . . . , T. 
We now want to find a strategy of virtual player 2 that is a best response to the given strategy 
of virtual player 1. Lemma [7] established that U t is a controlled Markov process with the 
prescriptions Tj,Tf as the controlling actions. Since has been fixed to ^(ILJ, it follows 
that, under the fixed strategy of virtual player 1, U t can be viewed as a controlled Markov 
process with the decisions of virtual player 2, r| as the controlling action. 

At time t, if c t is the realization of common information, n t is the corresponding realization 
of the common information belief, then jl = i^l^t) is prescription selected by virtual player 1. 
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If virtual player 2 selects 7 2 , the expected instantaneous cost for the virtual player 2 is 

E[c 2 (X t ,U t \U 2 )N = E[c 2 (X 4 , 7 i(pi), 7 2 (P 2 ))| Ci ] 
= E c 2 (x t , 74 1 (p 4 1 ),7 4 2 (p?))P(x t ,p 4 1 ,p, 2 |c t ) 



x *.Pt.p? 



E c'i^^KpD^^p^M^PlPt) =■ 5 2 (^,7*) (45) 



Thus, given the fixed strategy of virtual player 1, the instantaneous expected cost for virtual 
player 2 depends only on the belief n t and the prescription selected by virtual player 2. Given 
the controlled Markov nature of ir t , it follows that virtual player 2's optimization problem is a 
Markov decision problem with Ili as the state and hence virtual player 2 can optimal select its 
prescription as a function of U t . This completes the proof of the lemma. 

Appendix D 
Proof of Theorem [2] 

Consider a strategy pair (ip 1 ,^ 2 ) that satisfies the conditions of the theorem. For any 1 < 
k < T and any realization Cfc of the common information at time k, we want to show that the 
strategies form a Nash equilibrium of the sub-game starting from time k with the costs given as 

T 

E[Ec l (X 4 ,Uj,U 2 )|c fc ], (46) 

t=k 

i = 1,2. If the strategy of player j is fixed to ip 3 t , t — k, k + 1, . . . , T, then by arguments similar 
to those in the proof of Lemma [8l the optimization problem for player i starting from time k 
onwards with the objective given by (|46l) is a Markov decision problem which we denote by 
MDP % k . Since ipl,t = k, k + 1, . . . , T, satisfy the conditions of Theorem [2] for player i, they 
satisfy the dynamic programming conditions of MDP^. Thus, iftl,t = k, k + 1, . . . , T, is the 
best response to tpi , t = k, k + 1, . . . , T, in the sub-game starting from time k. Interchanging the 
roles of i and j implies that the strategies ipl,ip^,t = k, k + 1, . . . , T, form an equilibrium of 
the sub-game starting from time k. Since k was arbitrary, this completes the proof of sufficiency 
part of the theorem. The converse follows a similar MDP based argument. 
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Appendix E 
Proof of Theorem [3] 

Consider any realization ix of the common information based belief and consider a Bayesian 
Nash equilibrium 7 1 *,7 2 * of the game SGt{tt). We will show that 7 1 *,7 2 * satisfy the value 
function conditions for time T in Theorem [21 By definition of Bayesian Nash equilibrium, for 
every realization p 1 of P^, 

E^Xt^P^T^P^IPx = P 1 ] < E^Xr,^)^ 2 *^))^ = P 1 ], (47) 
for any choice of 7 1 . Averaging over p 1 , we get 

ET r [E[c 1 (X r) 7 1 *(P4.),7 2 *(PT))|PT]] <^\&[c\*T,f{Vh),l 2 \?m~PT] 
=► E^X^ 1 *^)^ 2 *^))] < E^X^f (P^),7 2 *(P 2 ))], (48) 
where all the expectations are with respect to the belief n on (Xt, Py, P 2 -). Similarly, 

E^Xt^P^^P 2 "))] < E 7r [c 2 (X r ,7 1 *(P^),7 2 (P|))], (49) 
for any choice of 7 2 . Thus, ^(tt) := 7", i — 1,2 satisfy the conditions in (1201) and (12TT) when 

ITt = 7T. 

Similarly, for any time t < T, consider any realization 7r of the common information based 
belief at t and consider a Bayesian Nash equilibrium 7 1 *,7 2 * of the game SG t (ir). Then, by 
definition of Bayesian Nash equilibrium, for every realization p 1 and any choice of 7 1 , we have 
that the expression 

E^c^X^y^P^^^P 2 )) + V&iFtfr, Z t+1 ))\P] = p 1 ], 

(where Z t+1 is the increment in common information generated according to ©, © and (Q~|) 
when control actions U] = 7 1 *(p*) and U 2 = 7 2 *(P 2 *) are used) can be no larger than 

E-[c 1 (X,,7 1 (Pi),7 2 *(P 2 )) + ^ 1 +1 (F,(vr,Z m ))|P, 1 = P 1 ] ) 
(where Z t+1 is the increment in common information generated according to ©, Q and ([!]) 
when control actions Uj = 7 1 (p*) and U 2 = 7 2 *(P 2 *) are used. Similar conditions hold for 
player 2. Averaging over p x ,p 2 , establishes that ipKn) := 7", i = l,2 satisfy the conditions in 
(1221) and ([23]) when ili = vr. 

Thus, the strategies ip\ i = 1,2 defined by the backward induction procedure of Algorithm 1 
satisfy the conditions of Theorem [2] and hence form a Markov perfect equilibrium for game G2. 
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Supplementary Material 

Appendix F 
Proof of Lemma [2] 

Proof: It is straightforward to verify that the structure of common and private information 
satisfies AssumptionQ] We focus on the proof for Assumption^ For a realization y\. t , y 2 . t ,u\. t , uf. t 
of the common information at time t + 1, the common information based belief can be written 

as 

7r m (x m ,yi +1 ,y t 2 +1 ) = P 5 ^(X m = x m , Y] +1 = yl +1 ,Y 2 t+1 = y t 2 +1 |y! :t , y?. t , u\ :t , u?. t ) 
= P(Y t x +1 = yi +1 |X m = x m )P(Y 2 +1 = y 2 +1 |X m = x t+1 ) 

x p^-^(x m = ^ t+1 \yi :t y 1:t ,ui t y 1:t ) 

= F(Yl +1 = yl +1 \X t+1 = x m )P(Y 2 +1 = y 2 +1 |X m = x t+1 ) 

W&t+i = x m |X, = x t ,^,u 2 )P 9 ^(X t = x t |yi :t) y 2 :t ,u 1:t ,u 2 



x . 



,2 \ 



(50) 



where we used the dynamics and observation model to get the expression in (1501) . It can now be 
argued that in the last term in (1501 ), we can remove the terms uj,u 2 in the conditioning since 
they are functions of the rest of terms y\. t , yf. t , Ui. t _ 1( u 2 t _ x in the conditioning. The last term 
in (|5Q|) would then be 

P^^(X t = x t |yi t ,y? !t ,^ s4 _ 1 ,u? s4 _ 1 ) l 

which is known to be independent of choice of control laws g\. t ,g\. t |fT9ll . Thus, ix t +\ is inde- 
pendent of the choice of control laws. For the sake of completeness, we provide a more detailed 
argument below. 

The last term in (l50l) can be written as 



p.L^^x^x^yl^yL^i^u 2 ,) 
= P^,sL( Xt = x, \y\ :t , y 2 1 , uL_ 1; u 2 t _ x ) 

_ P^Lpc, = x f , Y^ 1 = y^Yf = y^lyL-^yL-i.uL-i.uL-i: 
p^^fyi = y i, Y 2 = y^y^yf^X^u 2 ^) 

7r t (xt,y^y?) 



Ex' ^tW) y? ) 



(51) 
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Combining (|50l) and (15TI ) establishes that 7r t+1 is a function only of n t and z t+1 = (y*, y 2 , uj, uf) 
Further, the transformation form (7r t , z t+1 ) to 7r t+1 does not depend on the choice of control 
strategies. ■ 

Appendix G 
Proof of Lemma [3] 

Proof: It is straightforward to verify that the structure of common and private information 
satisfies Assumption [TJ We focus on the proof for Assumption [2l For a realization yi : t + i,yi :t , 
u\. t , uf :i of the common information at time t + 1, the common information based belief can be 
written as 

7r f+1 (x m ,y 2 +1 ) = P 9 ^' 9 Hx t+1 = x m , Y 2 +1 = y 2 t+1 \y{ :t+1 ,y 2 l:t ,u{ :t ,ul t ) 

= P(Y 2 +1 = y 2 +1 |X m = x, +1 )P^' 9 -(X m = x t+1 |yL +1 ,y 2 

P^^(X m = x m ,Yr l +1 =y t 1 +1 |y 1 1 :ti y 2 f ,uL,u 2 f 



P(Y 2 +1 =y 2 +1 |X m = x m ; 



£ x P^L(X t+1 = x, Yl +1 = y i +1 |yL,yL,u^,u 2 : 



(52) 



The numerator in the second term in (|52|) can be written as 

P(Yki = y^ilXt+i = x m )P^>^(X t+1 = x t+1 |yj :t ,y 2 :t ,ul :t) u 2 :t ) 

= P(Y t 1 +1 = y t 1 +1 |X m = x m )x 

J2 [P(X t+ i = x t+1 |X, = x 4 ,u,\u 2 )P^(X 4 = xtlyi^y^^u^^u 2 ^; 

Xi 

= P(Y t 1 +1 = y, 1 +1 |X m = x t+1 )x 

y^Ttp/y iy i 2 N p9ll:t ' 9L ( x < = x ^y?|yi 1 : ^yL-i ) uL_ 1 ,u 2 :t _ 1 ; 



x/ 



P^ rt (y?l^,y?*-i,ui*-i,uf*-i) 

^(Y^ = y m |x m = x m ) £ [p(x m = x m |X, = ^ul^t f^p 

„ L ^ iy* J 



(53) 



Similar expressions can be obtained for the denominator of the second term in (|52|) to get 

7r m (x m ,y 2 +1 ) = P(Y 2 +1 = y 2 +1 |X m = x m )x 



P(Y t 1 +1 = yl +1 




+ 1 - Xf + i) Ex t 


P(X t+1 = x m 


Xt = x^u^u^Tr^x^y 2 ) 


F(Yl +1 = yj +1 


Xm = x) Exi 


P(X t+1 = x 


x ; 


= x^u^u^KK.y 2 ) 





--: F^Tr^y^^y^u^u 2 ) = F t (n t ,z t 



(54) 
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Appendix H 
Proof of Lemma [4] 

Assumption Q] is clearly satisfied. We focus on Assumption [21 Case A: For a realization 
y\-.ti y\-t-di u i:t-i °f me comm on information, the common information based belief in this case 
can be written as: 

Kt{*t,yl-d+i:t) = p^-^x* = x ^ Y t-d+i-.t = yt-d+ut\ylt,ylt-d, u l.t-i) 

— ^2 \^0^t-d+l:t = yt-d+l:t\Xt-d+l:t-l = X-'t-d+Ut-V ^ = X t ) 



• P»i=«-i(X t = X t ,X t _d:t-l = x't-d:t-l\yl:t,ylt-d,U 1 l 



t-l, 



(55) 



The first term in (1551) depends only on the noise statistics. To see how the second term in 
(1551) is strategy independent, consider a centralized stochastic control problem with controller 
1 as the only controller where the state process is X t := (X t _^ t ), the observation process is 
Y t := (Yl,Yt_ d ). The second term in ([551) is simply the information state P(X t |y 1; i, uJ.^-J 
of this centralized stochastic control problem which is known to be strategy independent and 
satisfies an update equation of the form required by Lemma |4] lfT9l . 

Case B: Using arguments similar to those in Case A, the common information based belief 
n t for a realization yl. t _ 1 ,yl. t _ d ,u\. t _ 1 of the common information can be written as: 

7T t (Xt, yl, ytd + l:t) = Yl [ P ( Y * = y *' Y t-d+l:t = yt-d+l:t\ X t-d+l:t-l = ^ t _ d +l:t-l^t = Xj) 



• P^-i(X t = x t ,X t _ d:t _i = 34_ efct _ 1 |yi !t _ 1 ,y? !t _ d ,uJ :t _ i ; 
The second term in (1561) is 

Ply^lx^^p^^x^x^x^^x^-ilyL-i.yL-d-i^l.-i: 
F^(Yi d = yLM t -i,yL- d -iX t -i) 

Both the numerator and the denominator can be shown to be strategy independent using the 
transformation to centralized stochastic control problem described in case A. 



(56) 



(57) 
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Appendix I 
Proof of Lemma [5] 

For a realization y1 lt+1 , uj :i , ul :t of the common information at time t + 1, the belief 7r i+ i is 
given as 

n t+1 (x°,x\x 2 ) = P«i*-^*-i(X° +1 = x ,^ = x\* t 2 +1 = x 2 \y° 1:t+1 ,ul t ,ul t ) (58) 
= yUXtn = xl^-^jX^ = x\Xl +1 = x\X 2 +1 = x*\tf :t ,u\*M:t) 

£* W +1 = y? + i\x? + i = ijp^^^ = xfc u} !4> uf :t ) 

(59) 

The control strategy dependent term in the numerator in (l59l) can be written as 

P^l^Xl, = x\Xl +l = x\X 2 +1 = x 2 \yl,ni t ,ul t ) 
= = *°. = s 1 , X 2 +1 = x 2 |X° = x', ul u 2 t ) 



X' 

= P ( X m = = *\ x m = = uj, U?)7T t (x') (60) 

x' 

Similarly, the control strategy dependent term in the denominator in (l59l) can be written as 

= <, ul,) = = x \ X t = < ">t(x") (61) 

x" 

Substituting ([60]) and d6B in ([59]) establishes the lemma. 

Appendix J 
Proof of Lemma [6] 

Consider a realization c t of the common information C t at time t. Given the realization 
of the common information based belief ir t , we can find the joint conditional distribution 
on (X i; ~P], Pf, X t+ i, Pj +1 , Pf + i, Z t+ i) conditioned on the common information at time t as 
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follows: 

P( X t> Pt\ Pi) X t+1) Pt+l) Pt+1> Z t+l| C t 



5^ P( x t, pj, Pt, x t+1 , p^, p 2 t+l , z t+l , yl +l , yl +1 \c t ) 
yt+i>y?+i 

l{C t+ i(p^p?,y t V 1 ,y? +1 )=z t+1 }l{C i V 1 (phy t 1 +1 )=P t 1 +1 } 1 fe 2 +1 (p? i y? +1 )=P? +1 } 



yt+i>y?+i 



x P (y t+ 1 > y *+ 1 1 x < +i ) p ( x *+ 1 1 x * ) ( x * , Pi 1 , Pt ) 



(62) 



Note that in addition to the arguments on the left side of conditioning in (|62l , we only need n t 
to evaluate the right hand side of (l62l) . 

We can now consider the common information based belief at time t + 1, 

%i(%.Pw.Pm) = P( x *+i,Pt+i>P?+i|ct+i) 

= P( x m, Pl+i, Pt+il^t, zt+i) 
_ P( x m; Pt+D P?+n z ml c t) 



P(z t+ i|c t ) 



(63) 



The numerator and denominator of (1631) are both marginals of the probability in (1621) . Using ([62 
in (1631) . gives 7r t+ i as a function of 7Tt,z t+ i. 
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